Factors affecting the errors in the estimation of evolutionary distances between sequences.
نویسندگان
چکیده
Phylogenetic methods that use matrices of pairwise distances between sequences (e.g., neighbor joining) will only give accurate results when the initial estimates of the pairwise distances are accurate. For many different models of sequence evolution, analytical formulae are known that give estimates of the distance between two sequences as a function of the observed numbers of substitutions of various classes. These are often of a form that we call "log transform formulae". Errors in these distance estimates become larger as the time t since divergence of the two sequences increases. For long times, the log transform formulae can sometimes give divergent distance estimates when applied to finite sequences. We show that these errors become significant when t approximately 1/2 |lambda(max)|(-1) logN, where lambda(max) is the eigenvalue of the substitution rate matrix with the largest absolute value and N is the sequence length. Various likelihood-based methods have been proposed to estimate the values of parameters in rate matrices. If rate matrix parameters are known with reasonable accuracy, it is possible to use the maximum likelihood method to estimate evolutionary distances while keeping the rate parameters fixed. We show that errors in distances estimated in this way only become significant when t approximately 1/2 |lambda(1)|(-1) logN, where lambda(1) is the eigenvalue of the substitution rate matrix with the smallest nonzero absolute value. The accuracy of likelihood-based distance estimates is therefore much higher than those based on log transform formulae, particularly in cases where there is a large range of timescales involved in the rate matrix (e.g., when the ratio of transition to transversion rates is large). We discuss several practical ways of estimating the rate matrix parameters before distance calculation and hence of increasing the accuracy of distance estimates.
منابع مشابه
An Evolutionary Relationship Between Stearoyl-CoA Desaturase (SCD) Protein Sequences Involved in Fatty Acid Metabolism
Background: Stearoyl-CoA desaturase (SCD) is a key enzyme that converts saturated fatty acids (SFAs) to monounsaturated fatty acids (MUFAs) in fat biosynthesis. Despite being crucial for interpreting SCDs’ roles across species, the evolutionary relationship of SCD proteins across species has yet to be elucidated. This study aims to present this evolutionary relationship based on amino aci...
متن کاملMitochondrial DNA variation, genetic structure and demographic history of Iranian populations
In order to survey the evolutionary history and impact of historical events on the genetic structure of Iranian people, the HV2 region of 141 mtDNA sequences related to six Iranian populations were analyzed. Slight and non-significant FST distances among the Central-western Persian speaking populations of Iran testify to the common origin of these populations from one proto-population. Mismatch...
متن کاملMultivariate geostatistical estimation using minimum spatial cross-correlation factors (Case study: Cubuk Andesite quarry, Ankara, Turkey)
The quality properties of andesite (Unit Volume Weight, Uniaxial Compression Strength, Los500, etc.) are required to determine the exploitable blocks and their sequence of extraction. However, the number of samples that can be taken and analyzed is restricted, and thus the quality properties should be estimated at unknown locations. Cokriging has been traditionally used in the estimation of spa...
متن کاملمیزان، نوع و عوامل مؤثر بر خطاهای دارویی از منظر پرستاران شاغل در بخشهای مراقبت ویژه و اورژانس بیمارستانهای آموزشی دانشگاه علوم پزشکی شهرکرد
Introduction: Considering the increasing number of drug errors, this study was conducted to investigate the factors affecting drug errors from nurses' point of view in ICU and Emergency Unite in educational hospitals of Shahrekord University of Medical Sciences in 2016. Methods: The present study is a descriptive study on 221 nurses working in Shahrekord hospitals. The data gathering tool was ...
متن کاملIdentifying the Factors Affecting the Incidence of Medication Errors of Nurses in Teaching Hospitals of Shiraz University of Medical Sciences
Background: Medication errors are one of the major causes of injury to patients while receiving medical care. This study aimed to investigate the effective causes of medication errors in nurses in educational hospitals affiliated with Shiraz University of Medical Sciences. Methods: This descriptive-analytical and cross-sectional study was conducted in 2020 on 340 nurses from 10 educational ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Molecular biology and evolution
دوره 20 1 شماره
صفحات -
تاریخ انتشار 2003